13 May, 2020

h2.title { font-size: 8px; #color: #a9a9a9; text-align: center; }

Introduction

Data set:

  • breast cancer

  • proteomics by mass spectrometry

  • four cancer classes:

 

Goal:

  • Explore the data to identify patterns

  • Create models to predict breast cancer class  

Project’s GitHub repository

Material and Methods

Material and Methods

Material and Methods

Material and Methods

Material and Methods

Course-related packages

  • tidyverse
  • broom
  • patchwork
  • keras/tensorflow

Course-unrelated packages

  • gridExtra

Results — no outliers on total protein expression

Results — breast cancer classes in the dataset are well represented

Results — breast cancer classes do not discriminate on age

Results — breast cancer and gender

Results — protein expression heatmap

Results — dimensionality reduction

Results — K-means clustering

Results — ANN model’s structure

Results — ANN performance

Discussion

  • K-means clustering Acc.: 72.7% - ANN model Acc.: 82.8%

 

  • Collect more data for building more reliable models

 

  • Combine proteome data with RNAseq data to investigate more associations - network analysis

 

  • Tidyverse R package is a smart and elegant tool for data analysis and visualization

The end